import pandas as pd
import pandas_datareader.data as pdr
import datetime as dt
import matplotlib.pyplot as plt
from statsmodels.tsa.stattools import adfuller
"ggplot") plt.style.use(
# Load the data
= "2000-01-01"
start_date = dt.datetime.today()
end_date = pdr.DataReader(
df =["SP500", "GDP", "UNRATE"], data_source="fred", start=start_date, end=end_date
name
)
# Rename the variables
= ["S&P500", "GDP", "Unemployment_Rate"]
df.columns
# Convert the data to log differences
# df = df.diff().dropna()
"S&P500_month"] = df["S&P500"].resample("M").mean() df[
/tmp/ipykernel_4863/2974662571.py:1: FutureWarning:
'M' is deprecated and will be removed in a future version, please use 'ME' instead.
\[ \mathbf{y}_{{t}}=\left[y_{1, t}\ \ y_{2,t}, \ \ \ldots\ \ y_{n, t}\right]^{T} \]
\[ \mathbf{y}_t=\mathbf{G}_0+\mathbf{G}_1 \mathbf{y}_{t-1}+\mathbf{G}_2 \mathbf{y}_{t-2}+\ldots+\mathbf{G}_{{p}} \mathbf{y}_{t-{p}}+\mathrm{e}_{{t}} \]
\[\begin{aligned} & \mathrm{E}\left(\mathrm{e}_t\right)=0 . \\ & \mathrm{E}\left(\mathrm{e}_t \mathrm{e}_t^{T}\right)=\Omega \\ & \mathrm{E}\left(\mathrm{e}_t \mathrm{e}_{t-k}^{T}\right)=0 \end{aligned}\]where \(\mathbf{G}_0\) is \(n\times 1\) vector, and \(\mathbf{G}_i\), \(i\in (1,...,p)\) is \(n\times n\) matrix, the last term \(\mathrm{e}_{{t}}\) is white noise vector.
A \(p\)-th order VAR is covariance stationary if \[ E\left[\mathbf{y}_t\right]=E\left[\mathbf{y}_{t+{j}}\right] \] \[ E\left[\left(\mathbf{y}_{{t}}-\boldsymbol{\mu}\right)\left(\mathbf{y}_{{t}+{j}}-\boldsymbol{\mu}\right)^{T}\right]=E\left[\left(\mathbf{y}_{{s}}-\boldsymbol{\mu}\right)\left(\mathbf{y}_{{s}+{j}}-\boldsymbol{\mu}\right)^{T}\right]=\boldsymbol{\Gamma}_{{j}} \] where \(t\neq s\)
In a more technical view, rearrange the VAR model \[ \left(\boldsymbol{I}_n-\boldsymbol{G}_1 L-\boldsymbol{G}_2 L^2-\ldots-\boldsymbol{G}_p L^p\right) \boldsymbol{y}_t=\boldsymbol{G}_0+\boldsymbol{e}_t \] Replace the lag polynomial by \(\boldsymbol{G}(L)\) \[ \boldsymbol{G}(L) y_t=\boldsymbol{G}_0+\boldsymbol{e}_t \] For VAR to be stationary, \(\boldsymbol{G}(L)\) must be invertible, which requires all roots of characteristic equation outside of unit circle.
A stationary \(\text{VAR}\) is equivalent to $() $ \[ y_t=\boldsymbol{G}(L)^{-1}\boldsymbol{G}_0+e_t = \boldsymbol{\mu} +e_t \] Rest of steps are similar as \(\text{ARIMA}\) \[ \begin{gathered} \mathbf{y}_{\mathrm{t}}=\boldsymbol{\mu}+\left(\mathrm{I}_{n}+\boldsymbol{\Psi}_1 {L}+\boldsymbol{\Psi}_2 {L}^2+\ldots\right) \boldsymbol{e}_{t} \\ \mathbf{y}_{\mathrm{t}}=\boldsymbol{\mu}+{e}_{{t}}+\boldsymbol{\Psi}_1 \boldsymbol{e}_{\mathrm{t}-1}+\boldsymbol{\Psi}_2 \boldsymbol{e}_{t-2}+\ldots \\ \mathbf{y}_{\mathrm{t}}=\boldsymbol{\mu}+\sum_{{t}=0} \boldsymbol{\Psi}_{{t}} \boldsymbol{e}_{{t}-1} \end{gathered} \]
VAR models have two purposes: forecasting and structural analysis, the former is done by reduced-form VAR, the latter is usually done by SVAR.
And because there is no identification issue, we can directly estimate each equation by OLS.
An example of VAR(1) without constant term
\[ \left(\begin{array}{l} Y_t \\ X_t \end{array}\right)=\left(\begin{array}{ll} a_{11} & a_{12} \\ a_{21} & a_{22} \end{array}\right)\left(\begin{array}{l} Y_{t-1} \\ X_{t-1} \end{array}\right)+\left(\begin{array}{l} \epsilon_{Y, t} \\ \epsilon_{X, t} \end{array}\right) \]
To illustrate estimation process, we import quarterly GDP and monthly consumer sentiment from FRED, however in order to take advantage of abundant monthly observation of consumer sentiment, quarterly GDP shall be resampled into monthly by linear interpolation.
= dt.datetime(1980, 1, 1)
start = dt.datetime.today()
end = pdr.DataReader(["GDP", "UMCSENT", "BSCICP03USM665S"], "fred", start=start, end=end)
df = df.dropna()
df = ["GDP", "Consumer_confidence", "Manu_confidence"] df.columns
"GDP"] = df["GDP"].resample("MS").interpolate(method="linear")
df[ df.head()
GDP | Consumer_confidence | Manu_confidence | |
---|---|---|---|
DATE | |||
1980-01-01 | 2789.842 | 67.0 | 98.62623 |
1980-04-01 | 2797.352 | 52.7 | 96.38105 |
1980-07-01 | 2856.483 | 62.3 | 96.30575 |
1980-10-01 | 2985.557 | 75.0 | 100.24680 |
1981-01-01 | 3124.206 | 71.4 | 99.50925 |
def adf_test(x):
= ["Test Statistic", "p-value", "# of Lags Used", "# of Observations Used"]
indices
= adfuller(
adf_results_all ="BIC"
x, autolag# you can try 'AIC' too, I personaly prefer 'BIC'
) = pd.Series(adf_results_all[0:4], index=indices)
results
for key, value in adf_results_all[4].items():
"Critical Value ({})".format(key)] = value
results[= pd.DataFrame(results, columns=["statistics"])
results return pd.DataFrame(results)
= {}
p_value for i in df.columns:
= adf_test(df[i]).iloc[1][0] p_value[i]
/tmp/ipykernel_4863/1563081415.py:3: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/tmp/ipykernel_4863/1563081415.py:3: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
/tmp/ipykernel_4863/1563081415.py:3: FutureWarning:
Series.__getitem__ treating keys as positions is deprecated. In a future version, integer keys will always be treated as labels (consistent with DataFrame behavior). To access a value by position, use `ser.iloc[pos]`
p_value
{'GDP': 1.0,
'Consumer_confidence': 0.011101346251876653,
'Manu_confidence': 5.362683319661379e-09}
"GDP_growth"] = df["GDP"].pct_change()
df[= df.drop(["GDP"], axis=1) df
= df.dropna() df
from statsmodels.tsa.api import VAR
= VAR(df) var_model
= var_model.fit(verbose=True, ic="bic", method="ols") var_results
<statsmodels.tsa.vector_ar.var_model.LagOrderResults object. Selected orders are: AIC -> 2, BIC -> 2, FPE -> 2, HQIC -> 2>
Using 2 based on bic criterion
Model Selection Criteria
We have seen many times of \(\text{AIC}\) and \(\text{BIC}\) in printed linear regression results, the former tends to favor overfitting models and the latter underfitting models.
In \(\text{ARIMA}\) formulation, it’s reasonable to choose a overparameterized model if it has sound theoretical foundation, however in \(\text{VAR}\) formulation, we should select parsimonious model whenever possible, since it’s intensively parameterized, that means we should use \(\text{BIC}\) if no obvious reasons to deviate.
Hannan-Quinn Information Criterion (\(\text{HQIC}\)), Final Prediction Error (\(\text{FPE}\)) and Det(Omega_mle) are also provided, but not really providing extra values in most of cases.
Strategy of Choosing Variables
Construct a core VAR, then rotate with extra variable one by one, to avoid estimating too many coefficients at once.
Forecasting
Then: \[ \begin{aligned} & \mathbf{Y}_{\mathrm{t}-1}=\left(\mathbf{y}_{\mathrm{t}-1}, \mathbf{y}_{\mathrm{t}-2}, \ldots, \mathbf{y}_{\mathrm{t}-\mathrm{T}}\right) \\ & E\left[\mathbf{y}_{\mathrm{t}} \mid \mathbf{Y}_{\mathrm{t}-1}\right]=\widehat{\mathbf{G}}_0+\widehat{\mathbf{G}}_1 \mathbf{y}_{\mathrm{t}-1}+\widehat{\mathbf{G}}_2 \mathbf{y}_{\mathrm{t}-2}+\ldots+\widehat{\mathbf{G}}_{\mathrm{p}} \mathbf{y}_{\mathrm{t}-\mathrm{p}} \\ & \end{aligned} \]
This is dynamic forecast, iterating one period forward: \[ E\left[\mathbf{y}_{t+1} \mid \mathbf{Y}_{t-1}\right]=\widehat{\mathbf{G}}_0+\widehat{\mathbf{G}}_1 \mathbf{E}\left[\mathbf{y}_t \mid \mathbf{Y}_{t-1}\right]+\widehat{\mathbf{G}}_2 \mathbf{y}_{t-1}+\ldots+\widehat{\mathbf{G}}_{\mathrm{p}} \mathbf{y}_{t-p+1} \] Iterating \(j\) periods forward: \[ E\left[\mathbf{y}_{\mathrm{t}+\mathrm{j}} \mid \mathbf{Y}_{\mathrm{t}-1}\right]=\widehat{\mathbf{G}}_0+\widehat{\mathbf{G}}_1 \mathrm{E}\left[\mathbf{y}_{\mathrm{t}+\mathrm{j}-1} \mid \mathbf{Y}_{\mathrm{t}-1}\right]+\widehat{\mathbf{G}}_2 \mathrm{E}\left[\mathbf{y}_{\mathrm{t}+\mathrm{j}-2} \mid \mathbf{Y}_{\mathrm{t}-1}\right]+\ldots+\widehat{\mathbf{G}}_{\mathrm{p}} \mathbf{y}_{\mathrm{t}-\mathrm{p}+\mathrm{j}} \]
var_results.summary()
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Sun, 27, Oct, 2024
Time: 17:41:19
--------------------------------------------------------------------
No. of Equations: 3.00000 BIC: -5.81182
Nobs: 174.000 HQIC: -6.03842
Log likelihood: -180.888 FPE: 0.00204378
AIC: -6.19308 Det(Omega_mle): 0.00181571
--------------------------------------------------------------------
Results for equation Consumer_confidence
=========================================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------------------
const 41.763559 48.461342 0.862 0.389
L1.Consumer_confidence 0.627829 0.081376 7.715 0.000
L1.Manu_confidence 0.630848 0.817240 0.772 0.440
L1.GDP_growth 91.988993 45.588738 2.018 0.044
L2.Consumer_confidence 0.251740 0.081485 3.089 0.002
L2.Manu_confidence -0.954969 0.780237 -1.224 0.221
L2.GDP_growth -15.151987 43.699814 -0.347 0.729
=========================================================================================
Results for equation Manu_confidence
=========================================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------------------
const 22.980896 4.693070 4.897 0.000
L1.Consumer_confidence 0.004026 0.007881 0.511 0.609
L1.Manu_confidence 1.158774 0.079143 14.642 0.000
L1.GDP_growth -2.486736 4.414883 -0.563 0.573
L2.Consumer_confidence -0.005678 0.007891 -0.720 0.472
L2.Manu_confidence -0.386287 0.075559 -5.112 0.000
L2.GDP_growth -7.002262 4.231957 -1.655 0.098
=========================================================================================
Results for equation GDP_growth
=========================================================================================
coefficient std. error t-stat prob
-----------------------------------------------------------------------------------------
const -0.090908 0.089401 -1.017 0.309
L1.Consumer_confidence -0.000219 0.000150 -1.460 0.144
L1.Manu_confidence 0.003879 0.001508 2.573 0.010
L1.GDP_growth -0.013218 0.084102 -0.157 0.875
L2.Consumer_confidence 0.000151 0.000150 1.005 0.315
L2.Manu_confidence -0.002791 0.001439 -1.939 0.053
L2.GDP_growth 0.128138 0.080617 1.589 0.112
=========================================================================================
Correlation matrix of residuals
Consumer_confidence Manu_confidence GDP_growth
Consumer_confidence 1.000000 0.390754 0.250565
Manu_confidence 0.390754 1.000000 0.435382
GDP_growth 0.250565 0.435382 1.000000
var_results.params
Consumer_confidence | Manu_confidence | GDP_growth | |
---|---|---|---|
const | 41.763559 | 22.980896 | -0.090908 |
L1.Consumer_confidence | 0.627829 | 0.004026 | -0.000219 |
L1.Manu_confidence | 0.630848 | 1.158774 | 0.003879 |
L1.GDP_growth | 91.988993 | -2.486736 | -0.013218 |
L2.Consumer_confidence | 0.251740 | -0.005678 | 0.000151 |
L2.Manu_confidence | -0.954969 | -0.386287 | -0.002791 |
L2.GDP_growth | -15.151987 | -7.002262 | 0.128138 |
Structural VAR
Fed expected inflation rises higher, then Fed hikes rate to bring down the inflation. Can it be other way around? This is causality question to answer.
How about government expect household demand weakens, tax-cutting policies are on the table.
Both monetary policy and fiscal policy are endogenously reacting to other variables. Key insight is you can’t measure how effective the policies are when they are reacting to other variables.
To measure the effects of policy, we should let it be an exogenous shock to the economy and measure how other economic variables are responding to the policy shock, technically this is done by impulse response function.
Finding out the form of structural model is called identification.
\[ A X_t=\beta_0+\beta_1 X_{t-1}+\mathrm{u}_t \]
\[ \underbrace{\left[\begin{array}{cc} 1 & a_{12} \\ a_{21} & 1 \end{array}\right]}_{\mathrm{A}}\left[\begin{array}{c} y_t \\ r_t \end{array}\right]=\left[\begin{array}{l} \beta_{10} \\ \beta_{20} \end{array}\right]+\left[\begin{array}{ll} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{array}\right]\left[\begin{array}{l} y_{t-1} \\ r_{t-1} \end{array}\right]+\left[\begin{array}{l} u_{y t} \\ u_{n t} \end{array}\right] \]
\(A\) contains the contemporaneous relations among endogenous variables.
\[ {A}^{-1} {AX}_t=\overbrace{{A}^{-1} \beta_0}^{G_0}+\overbrace{{A}^{-1} \beta_1}^{G_1} {X}_{t-1}+{A}^{-1} u_t \]
\[ {X}_t={G}_0+{G}_1 {X}_{t-1}+e_t, \quad {~A}^{-1} {A}={I}, \quad {e}_t={A}^{-1} u_t \]
Forecast error \(e\) is a linear combination of structural shocks \(u\).
Identification
SVAR can’t be estimated directly since it is a theoretical framework, we have to start from estimation VAR model.
As an example \[ \left[\begin{array}{l} y_t \\ r_t \end{array}\right]=\left[\begin{array}{l} g_{10} \\ g_{20} \end{array}\right]+\left[\begin{array}{ll} g_{11} & g_{12} \\ g_{21} & g_{22} \end{array}\right]\left[\begin{array}{l} y_{t-1} \\ r_{t-1} \end{array}\right]+\left[\begin{array}{l} e_{3 t} \\ e_{n t} \end{array}\right]\Rightarrow\begin{aligned} & y_t=g_{10}+g_{11} y_{t-1}+g_{12} r_{t-1}+e_{y t} \\ & r_t=g_{20}+g_{21} y_{t-1}+g_{22} r_{t-1}+e_{r t} \end{aligned} \] \(9\) parameters to estimate, \(6\) coefficients, \(2\) variances (of residuals) and \(1\) covariance (of residuals).
\[ \left[\begin{array}{cc} 1 & a_{12} \\ a_{21} & 1 \end{array}\right]\left[\begin{array}{c} y_t \\ r_t \end{array}\right]=\left[\begin{array}{l} \beta_{10} \\ \beta_{20} \end{array}\right]+\left[\begin{array}{ll} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{array}\right]\left[\begin{array}{c} y_{t-1} \\ r_{t-1} \end{array}\right]+\left[\begin{array}{l} u_{y t} \\ u_{r t} \end{array}\right] \]
\(10\) parameters to estimate, \(8\) coefficients, \(2\) variances (of residuals), covariance is zero due to independence of structural shocks. However reduced VAR can only get us max \(9\) parameters, so here we need to impose some restrictions.
This is called identification of SVAR model, is to use priori knowledge to impose restrictions on matrix \(A\).
\[ \underbrace{\left[\begin{array}{cc} 1 & 0 \\ a_{21} & 1 \end{array}\right]}_{{A}}\left[\begin{array}{l} y_t \\ r_t \end{array}\right]=\left[\begin{array}{l} \beta_{10} \\ \beta_{20} \end{array}\right]+\left[\begin{array}{ll} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{array}\right]\left[\begin{array}{l} y_{t-1} \\ r_{t-1} \end{array}\right]+\left[\begin{array}{l} u_{y t} \\ u_{r t} \end{array}\right] \]
\[ \left[\begin{array}{l} y_t \\ r_t \end{array}\right]=\overbrace{\left[\begin{array}{cc} 1 & \mathbf{0} \\ -a_{21} & 1 \end{array}\right]}^{{A}^{-1}}\left[\begin{array}{l} \beta_{10} \\ \beta_{20} \end{array}\right]+\overbrace{\left[\begin{array}{cc} 1 & \mathbf{0} \\ -a_{21} & 1 \end{array}\right]}^{{A}^{-1}}\left[\begin{array}{ll} \beta_{11} & \beta_{12} \\ \beta_{21} & \beta_{22} \end{array}\right]\left[\begin{array}{l} y_{t-1} \\ r_{t-1} \end{array}\right]+\overbrace{\left[\begin{array}{cc} 1 & \mathbf{0} \\ -a_{21} & 1 \end{array}\right]}^{A^{-1}}\left[\begin{array}{l} u_{y t} \\ u_{r t} \end{array}\right] \]
It becomes reduced-form VAR eventually.
\[ \left[\begin{array}{c} y_t \\ r_t \end{array}\right]=\left[\begin{array}{c} \beta_{10} \\ -a_{21} \beta_{10}+\beta_{20} \end{array}\right]+\left[\begin{array}{cc} \beta_{11} & \beta_{12} \\ -a_{21} \beta_{11}+\beta_{12} & -a_{21} \beta_{12}+\beta_{22} \end{array}\right]\left[\begin{array}{c} y_{t-1} \\ r_{t-1} \end{array}\right]+\left[\begin{array}{c} u_{y t} \\ -a_{21} u_{y t}+u_{r t} \end{array}\right] \]
Every parameters estimated from reduced-VAR can help to recover the parameters in structural form, \(9\) equations for \(9\) parameters, for example \[ \beta_{10} = g_{10}\\ -a_{21\beta_{10}+\beta_{20}}= g_{20} \]
To summarize, the identification process is simply imposing restriction on matrix \(A\) with priori knowledge or intuition.
Restrictions
How many restrictions can we have? It’s a simple math question.
Suppose you have \(n\) variables, then matrix \(A\) will be \(n\times n\). However the diagonal of \(A\) must be \(1\)’s, also there are \(n\) variances of structural shocks unknown, that leaves us \(n^2 + n - n = n^2\) unknowns.
There are \(n+\frac{n^2-n}{2}=\frac{n^2+n}{2}\) known, which mean \(n\) diagonal elements plus \(\frac{n^2-n}{2}\) off-diagonal elements in covariance matrix of \(E (e_t e_t^{T})=\Sigma_e\).
The number of restriction we can impose \[ n^2 - \frac{n^2+n}{2} \]
def getRestriction(n):
return int(n**2 - (n**2 + n) / 2)
7) getRestriction(
21
Christopher Sims suggests strategy called Recursive Ordering for identification.
Impulse Response
Wold representation to evaluate IRF
\({X}_t=\mu+\sum_{i=0}^{\infty} \mathrm{C}_i {U}_{t-i}\)
Variance Decomposition
Forecast error decomposition can tell use how much movement is due to the shock of interest or other shocks.